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Abstract 

This paper analyzes a steady state matching model interrelating 
the education and labor sectors. In this model, a heterogeneous popu¬ 
lation of students match with teachers to enhance their cognitive skills. 

As adults, they then choose to become workers, managers, or teachers, 
who match in the labor or educational market to earn wages by pro¬ 
ducing output. We study the competitive equilibrium which results 
from the steady state requirement that the educational process repli¬ 
cate the same endogenous distribution of cognitive skills among adults 
in each generation (assuming the same distribution of student skills). 

We show such an equilibrium can be found by solving an infinite¬ 
dimensional linear program and its dual. We analyze the structure 

*RJM thanks the University of Nice Sophia-Antipolis, the Becker-Friedman Institute 
for Economic Research, and the Stevanovich Center for Financial Mathematics at the 
University of Chicago for their kind hospitality during various stages of this work. He 
acknowledges partial support of his research by Natural Sciences and Engineering Research 
Council of Canada Grant 217006-08, and, during Fall 2013 while he was in residence 
at the Mathematical Sciences Research in Berkeley, California, by the National Science 
Foundation under Grant No. 0932078 000. We are grateful to Gary Becker, Yann Brenier 
and Rosemonde Lareau-Dussault for fruitful conversations. ©2014 by the authors. 

Raboratoire Dieudonne, Universite de Nice, Parc Valrose 06108 Nice Cedex 2 France 
alice.erlingerSgmail.com 

^Department of Mathematics, University of Toronto, Toronto Ontario M5S 2E4 
Canada, mccannSmath.toronto.edu 

^Department of Economics, University of Toronto, xianwen. shiOutoronto. ca, 
siowSchass.utoronto.ca, ronald.wolthoffSutoronto.ca 


1 



of our solutions, and give sufficient conditions for them to be unique. 
Whether or not the educational matching is positive assortative turns 
out to depend on convexity of the equilibrium wages as a function 
of ability, suitably parameterized; we identity conditions which imply 
this convexity. Moreover, due to the recursive nature of the educa¬ 
tion market, it is a priori conceivable that a pyramid scheme leads to 
greater and greater discrepancies in the wages of the most talented 
teachers at the top of the market. Assuming each teacher teaches N 
students, and contributes a fraction 9 €]0,1[ to their cognitive skill, we 
show a phase transition occurs at N6 = 1, which determines whether 
or not the wage gradients of these teachers remain bounded as market 
size grows, and make a quantitative prediction for their asymptotic 
behaviour in both regimes: N9 > 1 and N6 < 1. 


1 Introduction 

It is an economic truism that prices are determined primarily by what the 
market will bear. For example, executive compensations in large hrms may 
appear excessive when measured against average employee wages, but are 
often justihed by arguing that they are determined competitively by the 
market. To understand what levels of compensation a large market will or 
won’t bear, it is therefore tempting to ask questions such as: Can the ratio 
of the CEO’s wages over the average wage in a hrm be expected to tend to 
inhnity or a hnite limit, as the size of the hrm grows without bound? The 
answer to such a question may be expected to depend on various aspects 
of the organization of the hrm, such as the number of levels of management 
separating the CEO from the average worker, and the number of managers at 
each level. This organizational structure may itself be determined by market 
pressures — within the constraints of feasible technology. 

In this paper we investigate an analogous question set in the context of 
the education market, rather than that of a hrm. That is, we investigate 
how the wages of the most sought after gurus relate to those of the average 
teacher. The education market is special in various ways. It is stratihed 
into many diherent levels or streams which interact with each other, with a 
range of qualities available in every stream. Moreover, what it produces is 
human capital, the value of which is determined by the broader market for 
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skills of which the education market is itself a small part. Thus there is a 
feedback mechanism in the education market, owing to the fact that those 
individuals who choose to become teachers participate at least twice in the 
market: first as consumers and later as producers, putting to work the skills 
previously acquired in this market to generate human capital for the next 
generation. It is this feedback mechanism which is responsible for many of 
the results we describe; it leads to the formation of an educational analog for 
a pyramid scheme, in which teachers at each level of the pyramid attempt to 
extract as much as they can from their students future earnings, in the form 
of tuition. The question this time is whether or not the large market limit 
leads to wages which display singularities at the apex of the pyramid. 

We address this question using a variant of a steady state matching model 
introduced by four of us to analyze the coupling of the education and labor 
markets [16]. We proposed this model not only to provide a microeconomic 
foundation which allows to compare and contrast different sectors, but to 
examine interdependencies and the different roles played by communication 
and cognitive skills in each of them. An unexpected conclusion was that — 
as in much simpler (single stage, single sector) models [IB] [ID] [1], competi¬ 
tive equilibrium matching patterns for a heterogeneous steady state popula¬ 
tion can be found as the optimal solution to a planner’s problem taking the 
form of a linear program; see also [5]. The questions raised in the present 
manuscript will be addressed through a rigorous analysis of the resulting lin¬ 
ear program and its solutions, including criteria for existence, uniqueness, 
singularities, and a detailed description of the matching patterns which can 
arise. A remarkable feature is that this simple model leads to the emergence 
of a hierarchical structure in the education sector, with fewer and fewer indi¬ 
viduals at the top of the market earning higher and higher wages. A detailed 
exploration of this structure proves necessary to resolve the question of under 
what conditions these wages turn out to display singularities. An analogous 
hierarchy was explored by Becker and Murphy in the context of a steady 
growth model [2] §VII] quite different from ours. 

The education market is also unusual in many ways that our model does 
not capture. For example, non-pecuniary considerations are important for 
both teachers and students, and schools are often not operated on a for- 
profit basis; however, in our model we assume all participants maximize 
their expected monetary payoff. In addition, education markets (tuitions, 
for example) are heavily regulated, but here we abstract away all regulation 
restrictions. The goal of this paper, therefore, is not to provide a realistic 
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account of how teachers’ compensations are determined in the market, but 
rather to elucidate a feedback mechanism that is potentially important in de¬ 
termining wage compensation in education and other markets, and to provide 
a tool to solve matching models that incorporate this feedback mechanism 
with the potential to encompass multi-dimensional individual attributes. 

In the present model, we assume that the communication skills are ho¬ 
mogeneous over the entire population, hence deal with a population having a 
single dimension of heterogeneity plus parameters, rather than the multiple 
dimensions of heterogeneity in [16]. Hence, the model here can be viewed as a 
limiting case of multidimensional models in which the range of heterogeneities 
becomes narrow in all dimensions but one. There are two benehts from this 
simplifying assumption. First, it greatly simplihes our analysis. Second, the 
resulting model is a minimal departure from the classical matching model 
of one dimension of heterogeneity. We will show that this small departure 
actually generates results very different from the standard one-dimensional 
models of e.g. Lucas |T2| or Garicano [8]. 

As in [IH], we model communication skills as the number of students a 
teacher can teach or the number of workers a manager can manage, which 
is often referred to as “span of control”. In particular, we assume that each 
teacher can teach > 1 students. We use 9 g]0, 1[ to represent the extent 
to which a teacher’s cognitive skills get transmitted to each of their students. 
Similarly, N' > 0 and 6' €]0,1[ represent the number of workers each man¬ 
ager can manage in the labor market, and the extent to which a manager’s 
cognitive skill enhances the productivity of his or her workers. All market 
participants have the same N in the education market and the same N' in 
the labor market, but they differ in the cognitive skills k which are assumed 
to be continuously distributed over the interval K := [k,k] C R. As a result, 
the linear program is inhnite-dimensional, and the analysis is complicated 
by a lack of a priori bounds which could be used to show that equilibrium 
wages or payoffs exist for the model. Moreover, a pyramid can form in the 
education sector, enhancing the wages of the most skilled teachers. It is not 
obvious whether or not this pyramid structure can lead to unbounded wage 
behavior. Our analysis suggests it does not, but leads to unbounded wage 
gradients instead. 

We begin by elucidating a convexity property which allows us to derive 
the existence of equilibrium wages as solutions to an (inhnite-dimensional) 
linear program. This convexity is reminiscent of that discussed by Rosen 
in his investigation of superstars irzi- More surprisingly, after addressing 
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uniqueness and properties of these wages and the matches they induce, we 
go on to show that the model exhibits a phase transition, depending on 
the product of each teacher’s capacity N for students times their teaching 
effectiveness 6: the wage gradients diverge at the highest skill type if and only 
ii N9 > 1. When N9 > 1, the divergence is proportional to \k — k\ as 

k ^ k. Only by integrating this divergence can we conditionally show wages 
tend to a hnite limit at k which — in the large market limit — becomes 
independent of the size of the population being modelled. 

Although wage singularies for teachers may appear counter-factual, or at 
least modest compared to wage singularities for managers in the real world, 
this discrepancy between prediction and observation is easily explained by 
the fact that our model allows for only one layer of managers but a potentially 
unbounded number of layers of teachers. Thus a top teacher improves the 
cognitive skills of each of their N students who go on to be top teachers or 
managers. A good manager improves the productivity of each of their N' 
supervised workers. Thus, already in a two-layer hierarchy, a top teacher 
indirectly makes a large number N x N' oi workers more productive. Since 
the number of layers of the educational hierarchy is endogenous to the model 
and can be very large, the impact of gurus on the productivity of their direct 
and indirect students and workers can accumulate very substantially. 

The term phase transition is borrowed from statistical physics, where it 
refers to a sharp threshold in parameters (such as temperature) separating 
qualitatively different behavior (such as liquid from solid). In that context, 
the non-smoothness arises from a continuum limit which admits approxima¬ 
tion by hnite dimensional models depending smoothly on the same parame¬ 
ter (s). By analogy, if our continuum of agent types could be approximated 
using hnitely many agent types, we would expect to restore smooth depen¬ 
dence on the parameters N and 9, but this smoothness (i.e. the wage gradi¬ 
ents) would not admit control uniform in the number of types. In statistical 
physics, it is often the case that the critical exponents of the singularities 
(such as above) do not vary over a wide class of models, a phenomenon 
known as universality. In the present context, we observe that the exponent 
governing growth of the wage gradients is universal in the sense that it 
does not depend on various details of the model, such as the exact form of 
the production functions, or the input distribution of student skills, at least 
within the classes of such data considered hereafter. 

The remainder of this manuscript is organized as follows. In the hrst sec- 
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tion and subsections we lay out the model, and its variational reformulation 
in terms of a planner’s problem and its dual. We have argued in [TB] that 
solutions to these inhnite-dimensional linear programs represent competive 
equilibria; see also the announcement [I5]. In a second section and subsec¬ 
tions we address the existence, uniqueness and properties of these solutions. 
Even the existence of equilibrium wages in this model is rather non-trivial, 
and goes beyond the range of validity of any statement of the second wel¬ 
fare theorem that we know. Standard arguments concerning existence of 
an optimal matching and absence of a duality gap are relegated to an ap¬ 
pendix, which is logically independent of the rest of the analysis. Lemma [14] 
is also logically independent of the remaining analysis, and its hrst assertion 
is actually required at some earlier points in the text. 

1.1 The model: competitive equilibria 

Let us begin by describing our unidimensional variant of the model hrst in¬ 
troduced by [IB]. Consider an economy populated by risk-neutral individuals 
who each lives for two periods. Individuals, when they are young, enter the 
education market as students. In the subsequent period as adults, they en¬ 
ter the labor market to become teachers in schools, or workers or managers 
in hrms. Both the education market and the labor market are competitive. 
There is free entry for both schools and hrms. Hence, the tuition fees a 
school collects from students are just enough to cover the wage of its teacher, 
and a hrm’s output exactly covers the wages of its employees (workers and 
mangers). All individuals do not discount. The lifetime net payohs of in¬ 
dividuals are equal to the sum of their labor market plus non-labor market 
earnings minus tuition costs. Individuals choose what occupation to pursue 
and who to match with in each of the two markets to maximize their net 
payohs. 

Each individual is endowed with two kinds of skills, a communication skill 
{N > 1 or N' > 1) which is hxed throughout their lifetime, and an initial 
cognitive skill a which can be augmented through education. As in [TB], we 
assume that individuals diher in their initial cognitive skills a. In contrast 
to [TB], we assume that individuals share the same communication skills. By 
attending schools in the hrst period, individuals can augment their initial 
cognitive skills a to their adult cognitive skill k. Let A = [a, a[ with —oo < 
a < a < -|-oo denote the range of students’ initial cognitive skills a, and 
K = [k,k[ or rather its closure K the range of adult human capital k. Ability 
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or human capital refers to cognitive skill in both cases, and we occasionally 
use the variable names a and k interchangeably for convenience. For the 
model discussed here, taking K = A will not cost any generality, nor will the 
normalization a = = 0. 

The production functions in the education market and in the labor market 
are described as follows. We assume the cognitive skill z{a, k) acquired by a 
student of ability a & A who studies with a teacher of ability /c G iF is given 
by the weighted average z{a,k) = {1 — 9)a + 9k of their abilities, with weight 
9 g]0, 1[. We also assume the productivity &l((1 — 9')a + 9'k) of a worker 
with adult cognitive skill a supervised by a manager of skill k is given by a 
convex increasing function bi G {K^ of another such average, this time 
with weight 9' g]0,1[. Notice that abilities a and k here are measured on 
a logarithmic scale relative to the conventions of [16], a reparameterization 
which is crucial for exposing the sense in which the equilibrium wages may 
turn out to be convex. 

We allow for the possibility that cognitive skill 2 ; attained through edu¬ 
cation has value cbE{z) in addition to the wage earning potential it confers, 
where c > 0 is a dimensionless parameter and 6^ G C^{A) is another convex 


increasing function. The choice bEik) = = biik) with 9 = ^ = 9' corre¬ 

sponds to the motivating example from [16]; more generally we assume bE 
and bi and their hrst two derivatives have positive lower bounds 

0 < kE/L = & e / l ( 0 ) ( 1 ) 

0 < ILe/l — ^ E / i ,( 0 ) ( 2 ) 

^<hE/L — (3) 

where U'e/l dehned as the largest constant for which bE/hik) — 


is convex on K. We hope strict positivity of the analogous quantities will be 
inherited by the equilibrium payoffs u and v. 

Notice that what is being produced in each sector is different: in the 
labor and non-labor sectors we have not specihed the service or goods which 
are being produced, except that they take adult cognitive skills as their 
input (communication skills entering through possible dependence of c on 
parameters such as N and 9)] in the education sector it is adult cognitive 
skills which are being produced, taking student and teacher cognitive skills 
as their inputs. The dimensionless constant c > 0 measures the non-labor 
utility, if any, of individual attainment of cognitive skills relative to labor 
productivity; it replaces the marital utility used in early drafts of [I6] . 
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Let a probability measure a > 0 on A represent the exogenous distribu¬ 
tion of student abilities, and let spt a denote the smallest closed subset of 
A carrying the full mass of a. Taking A smaller if necessary ensures spt a 
contains both a and a. Our problem is to hnd a pair Borel measures e > 0 on 
Ax K and A > 0 on ^ x such that e represents the educational pairing 
of students with teachers, and A represents the labor pairing of workers with 
managers, along with a pair of payoffs or wage functions u,v ■. K — > [0, cxo] 
representing the net lifetime expected utility u{a) of a student with ability 
a, and the wage v{k) paid to an adult of ability k, which together consti¬ 
tute a competitive equilibrium (e, \,u,v). Roughly speaking, this means the 
matchings e, A must clear the market at each generation in a steady-state, 
and the payoffs u and v must be large enough to be stable, yet small enough 
that in combination with (e. A) they satisfy a budget constraint. 

Since we are interested in a steady state model, we assume the distribution 
of student abilities a on A is the same at each generation, and coincides with 
the left marginal 

= a (4) 

of the educational pairing e > 0 of student and teacher abilities. Here = 
TT^e and = Ti^e denote the left and right projections of e through 7r^(a, k) = 
a and 7i‘^{a,k) = k, representing the respective distributions of student and 
teacher abilities. Similarly A^ and A^ will denote the left and right marginals 
of the labor pairing A, representing the distribution of worker and manager 
skills. The steady state constraint requires that the educational pairing e of 
students with adults reproduce the current distribution of adult skills at the 
next generation: 

where the expression on the left represents the sum of the current distribu¬ 
tions of worker, manager and teacher skills; the latter have been scaled by 
N' and N respectively, to reflect the fact that each manager manages N' 
workers, and each teacher teaches N students, so comparatively fewer man¬ 
agers and teachers are required. The symbol k, := on the right represents 
the distribution of future adult skills resulting from the educational pairing 
e; it is given by the push-forward of e through the map z \ A x K — )■ K 
representing the educational technology, and assigns mass k[B] := e[z~^{By\ 
to each set B <Z K. 

The marginal constraint (|1|) forces e and hence k = z^e to be probability 


measures, like a. The workers form a fraction (1 — of the pop¬ 

ulation, coinciding with the total mass of A. The restriction K = A costs no 
generality, since we are in a steady state, and since our education technology 
satishes z{a, a) = a, whence z{a,k) = k and z{a, k) = k. 

Letting v{k) denote the wage commanded by an adult of skill k, and u{a) 
the net lifetime utility of a student of ability a, both must satisfy the stability 
conditions 

u{a) +—v{k) > cbE{z{a, k)) + v{z{a, k)) and (6) 

v{a) + ^v{k) > bL{{l — 6')a + 6'k) onAxR. ( 7 ) 

The constraint ([7]) enforces stability of matchings in the labor sector. If the 
reverse inequality held, N' adults with skills a and one with skill k would 
abandon their occupations to form N' worker-manager pairs each producing 
enough output to improve all iV' -|- 1 adults’ wages. Similarly ([6]) is a 
stable matching condition for the education sector. The lifetime net utility 
of a student with cognitive skill a plus the tuition v{k)/N paid by each 
student of a teacher with skill k must exceed a’s lifetime earnings plus any 
other benehts derived from cognitive skills which would have resulted had 
he (and iV — 1 of his clones) chosen to study with k. We can also regard 
the stability constraints ([S])^(IZ1) as combining to ensure each adult of type k 
in the population chooses the profession (worker, manager, or teacher) and 
partners (manager, workers, or students, respectively) which maximize their 
wage v{k) on the labor market. 

Finally, the budget constraint asserts that equality holds e-a.e. in (|6]), 
and A-a.e. in ([7]). In other words, the productivity 6 l((1 — 6')a + 6'k) of 
A-a.e. manager-worker pair (a, k) which actually forms is sufficient to pay 
the worker’s wage plus a fraction 1/N' of the manager’s salary. Similarly, e- 
a.e. student-teacher pairing (a, k) which forms must produce an adult whose 
earnings v{z{a,k)), supplemented by any additional utility cbE{z{a, k)) de¬ 
rived from the skill z{a, k) he acquires, must add up to the net lifetime utility 
which remains to the student after paying tuition equal to his share v{k)/N 
of his teacher’s earnings. 

To complete the specihcation of the model, we need to say in what class of 
functions the payoffs n, v must lie. Since we wish to allow for the possibility 
that the payoffs u,v : K —> [0, cxo] become unbounded at the upper end 
k of the skill range, it is convenient to dehne A = K = [0, as a half 
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open interval. We shall consider payoffs from the feasible set Fo consisting of 
pairs (m, v) = (mq + Ui, Vq + ni) satisfying (E])^(IZ1) which differ from bounded 
continuous functions uo,vo G C{A) by non-decreasing functions ui^vi ■. A —> 
[0, cxo]. If takes extended real values, we also require 

^{u{k)-chE{k))>v{k)>j^hL{k)>Q onK, ( 8 ) 

which otherwise follows from a = k m ([6])-©. We often require u and v to 
be proper, meaning lower semicontinuous and not identically infinite. This 
costs little generality, since when ©-(IH]) hold for non-negative functions 
(m, f), they continue to if u and v are replaced by their lower semicontinuous 

hulls. 

A competitive equilibrium refers to a pair of measures e, A > 0 and func¬ 
tions (u, v) G Fq satisfying (ll])-® plus the budget constraint 

equality holds e-a.e. in (|6]), and A-a.e. in ((^ (9) 

relating (e. A) to {u,v). The economic idea behind this definition is that no 
individual agent (nor any group of agents which is small relative to the size of 
the market) can improve their outcome by choosing to match otherwise than 
as prescribed by e and A. Here e represents an assignment of N students 
to each teacher, and reproduces the current distribution of adult skills in 
the next generation, starting from the given distribution a of student skills 
and educational technology z{a, k) = {1 — 6)a + 9k] the future earnings plus 
any non-labor utility received by the N students exactly add up to their net 
lifetime utilities, plus the salary of the teacher. Similarly, A represents an 
assignment of N' workers to each manager, the productivity of these worker- 
manager teams exactly sufficing to pay the respective wages of each team 
member. Both the educational and the labor markets clear, and the stability 
constraints guarantee no adult would prefer an occupation other than the 
one he or she has been assigned, nor to work with anyone other than the 
partners prescribed by (a, k) G spt A in the case of workers or managers, or 
by e in the case of teachers. Similarly, each pair (a, k) G spt e represents a 
student of ability a, who cannot improve his net lifetime payoff by training 
with any teacher other than the one of skill k that he is paired with under e. 

1.2 The planner’s problem and its dual 

Shapley and Shubik’s basic insight is that stable matching problems with 
transferable utility have a variational reformulation using linear programs 
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and their duals. In [TB] we observe that this insight extends from the familiar 
single-stage, single-sector setting of na m and [ 1 ], to steady-state multi¬ 
sector models such as the one introduced above. Denoting our education 
and labor market technologies by k) = & e ((1 — 0)a + 6k) and fo 0 /(a, k) = 
62,((1 — 6')a + O'k), the quartuple (e, A, u, v) forms a competitive equilibrium 
if and only if (n, v) attain the inhmum 

LP^ := inf j u{a)a{da) ( 10 ) 

(u,v)eFo 

over ([ 6 ])-([ 8 ]), while (e, A) attain the supremum 
LP* := max / [cbQ,{a, k)e{da, dk) + b0{a, k)X{da, dk)]. (11) 

e>0 and ^>0 on [^d]2 JrQ a]x[0,fc] 
satisfying l 7 j l 7 j 

We shall henceforth refer to {u, v) G Fq as optimal if it attains the inhmum 
OB, and to (e, A) as optimal if it attains the supremum (ITT]) . Whereas the 
notion of competitive equilibrium relates {u,v) to (e, A) through ([9]), one 
can discuss optimality of {u,v) without referring to (e, A), and vice-versa. 
This is the hrst of many advantages conferred by our Shapley-Shubik-like 
reformation of the problem at hand. 

We often use a{u) as a shorthand notation to denote the integral ap¬ 
pearing in ffTOj) . which represents the average student’s net lifetime utility. 
Similarly, ceipo) + X{bg,) denotes the argument appearing in the supremum 
(HU), and represents the total (non-labor -|- labor) utility produced by the 
pairings e and A. Thus if equilibrium wages {u,v) G Fo exist, they minimize 
the expected lifetime utility of students subject to the stability constraints. 
Similarly, any equilibrium matches maximize the utility ce^be) + Xib^,) being 
produced our model’s two sectors in each generation, subject to the market¬ 
clearing constraints (11])-([U in steady-state. The latter can be interpreted as 
a social planner’s problem; it is also the linear program dual to flTOl) . Sat¬ 
isfaction of the budget constraint ([9]) follows from the absence of a duality 
gap: the fact LP* = LP*, which is established below under the technical hy¬ 
pothesis that a satisfy a doubling condition at the top skill type a, meaning 
there exists C < 00 such that 


2—2Aa'( 


a{da) < C 


2 —Aa'( 


a{da) 


( 12 ) 
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for all Aa > 0. A surprisingly delicate part of the proof is the inequal¬ 
ity LP* < LP^ shown in Proposition |H1 the rest of the duality argument 
reproduced in Appendix!^ is quite standard. 

The variational characterization given by ffTOji - flTT]) is our starting point 
for the further analysis for the payoffs {u, v) and matchings (e, A) we seek. To 
show such competitive equilibria exist, it is enough to establish the infimum 
and supremum are attained. Attainment of the planner’s supremum is stan¬ 
dard, as recalled in Appendix It is less straightforward to show that the 
inhmum (ITOD is attained, and to elucidate the properties of the extremizers 
for either problem. A continuity and compactness argument is complicated 
by the fact that the wage function v appears on both sides of the educa¬ 
tion sector stability constraint, and has no obvious upper bound except in 
L^{A, a); c.f. dH]). 

When minimizers {u,v) exist, it is useful to know as much structural 
information as we can about them, in order to analyze the properties of the 
corresponding equilibrium matches. In the cases for which we have been 
able to deduce the existence of minimizers, they turn out to be non-negative, 
non-decreasing, convex functions of a G [0, a]. The fact that the monotonicity 
and convexity of u and v survive limits is crucial to the analysis. Indeed, 
our existence strategy is to first show (fTOj) is minimized under the additional 
assumption of convexity and monotonicity for u and v, and then to show 
that this additional constraint does not bind for the minimizing {u, v), which 
must therefore optimize the original problem of interest. In the absence of 
an atom at the top skill type, a[{fc}] = 0, it seems possible a priori that 
both u{k) and v{k) diverge to -|-cxo as k ^ k, without violating boundedness 
of the expected value LP^ = a{u). Although Theorem [16] tends to rule out 
this possibility, giving conditions instead for the gradients u'{a) and v'{k) to 
diverge, for the intermediate analysis it is useful to let A = [0,k[= K denote 
a half-open interval where we can assume u and v are real valued. 

In addition to (A^, 9) and (A^', 9'), dimensionless parameters such as > 

1 and c > 0 govern the behavior displayed by the model. Here is from (12!) 
and 


^'e/l — — sup&^/i(A;) 

keK 

so indexes the relative impact of an increase in skill on labor produc¬ 

tivity at the top versus the bottom of the skills market, while c measures the 
relative importance of any other satisfactions derived from cognitive skills 
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apart from the returns to labor which they help to enhance. Such satisfac¬ 
tions could be intrinsic, or they could represent externalities that cognitive 
skills and education provide, such as social status or — as in early drafts of 
[16] — marital prospects. We can also remove this effect from the model by 
setting c = 0. However, to implement the existence strategy outlined above, 
it turns out to be technically easier to analyze the case c > 0 first, and then 
take the limit c ^ 0 if desired. Many but not all of our structural results 
such as uniqueness, specialization, and positive assortativity also survive this 
limit; see Proposition 0 and Theorem [TSl for example. 

We shall also investigate occupational specialization by cognitive skill, 
showing min{A^'0', N9} > Vl/Hl implies that the highest types become teach¬ 
ers, while the lowest types become either workers or teachers, but not man¬ 
agers. More rehned statements appear in Proposition [3 For continuously 
distributed skill types, we show that a pyramid can form in the education 
sector, sometimes leading to divergence of wage gradients at the highest skill 
type when NO > meaning the span of control at each node in the pyra¬ 
mid is large enough. More explicitly, under suitable conditions Theorem fTHl 
asserts that as fc —)■ fc. 


v'{k) ~ 


log Q 


const\k — k\ 

cb'El{.m - 1 ) 


for NO > 1, 
for NO < 1, 


so a phase transition occurs at NO = 1. A less involved investigation of 
an analogous pyramid structure was given by Becker and Murphy [2[ §VII], 
in a different production model incorporating the cost of acquiring knowl¬ 
edge and assuming steady-growth as opposed to steady-state. To produce a 
similar pyramid in the labor sector, our model would need to be modified 
to permit managers to manage other managers — as in Garicano [Sj with 
Rossi-Hansberg [9] — instead of being forced to manage only workers whose 
productivity is inherently limited. If such a modification to our model could 
be achieved, it would have the potential to complement existing models for 
executive compensation such as Gabaix and Landier’s [B] , which rely instead 
on comparing given tail behaviors of the distributions of company size and 
managerial talent. 

Finally, Gorollary |9] characterizes the optimizers in the primal and dual 
problems ffTOji - flTT]) . Theorem [TSl provides sufficient conditions for uniqueness 
of (e. A), and discusses in what sense (n,n) are also unique. It gives condi¬ 
tions guaranteeing the optimal pairings A of workers with managers and e 
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of teachers with students are positive assortative in cognitive skills, meaning 
spt A and spt e are non-decreasing subsets of the plane. This monotonicity is 
intimately tied to the convexity of wages n as a function of /c G [0, k] asserted 
above. 


2 Analysis 


2.1 Terminology and notation 


In this section, we introduce terminology and notation that will be useful for 
dealing with functions which need neither be smooth nor bounded, and with 
the measures which arise naturally as their duals. 

Given any convex set B C R"', a function u : B —> R U {+oo} is said 
to be continuous if it is upper and lower semicontinuous. It is said to be 
Lipschitz with Lipschitz constant L if either u is identically infinity or else if 


L : = 


sup 

BBx^yGB 


u{x) - u{y)\ 
\x - y\ 


is hnite. It is said to be semiconvex with semiconvexity constant C if the 
function x E B \—)■ u{x) +C\x\‘^/2 is convex. It is said to be locally Lipschitz 
(respectively semiconvex) on R, if u is Lipschitz (respectively semiconvex) 
on every compact convex subset of B. Locally Lipschitz (respectively semi¬ 
convex) functions are once (respectively twice) differentiable Lebesgue a.e. 
In addition, locally semiconvex functions fail to be once differentiable on a 
set of Hausdorff dimension at most n — 1. 

By support of a Borel measure a > 0 on R™, we mean the smallest 
closed subset spt a C R”* of full mass: q;[R™' \ spt a] = 0. The push-forward 
f#aoia through a Borel map / : R™ —)■ R” is a Borel measure defined by 
{f^a)[Z] = a[f~^{Z)] for each Z C R'^. We say a has no atoms if a[{x}] = 0 
for each x G R™. A measure e on R^ is said to be positive assortative if spt e 
forms a non-decreasing subset in the plane: i.e. if {a' — a){k' — k) >t) for all 
(a, k), {a', k') G spt e. We use a|s to denote the restriction a|s(Z) = a[ZP\B] 
of a to R C R'”, and to denote Lebesgue measure on R". 


2.2 The educational pyramid 

In this section, we discuss the extent to which we can expect optimizers 
{u, v) to the minimization flTU]) to be smooth, at least away from the top skill 
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type k. We then apply these results to elucidate the nature of the pyramid 
structure which can form in the education sector. 

Given {u,v) G Fq feasible for the inhmum ffTOj) . use bg,{k',k) := &l((1 — 
9')k' + 6'k) and z{a, /c) = (1 — 6)a + 6k to dehne the wages implicitly avail¬ 
able to an individual of cognitive skill k employed as a worker, manager, or 
teacher, respectively: 

Vw{k) := sup h'Q,{k, k') - -^v{k'), 

k'&K 

Vm{k) := iV' sup fc) — n(F), and 

k'&K 

Vt{k) := NchE{z{a,k)) + v{z{a,k)) — u{a), 

a&A 

where we complete dehnition (lT5j) . and later (l35j) . with the convention 

oo — oo := oo. (16) 

The suprema flT^ - flTd)) are attained when u and v are proper (hence lower 
semicontinuous), and the same holds true for flT^ if, in addition, v is convex 
non-decreasing (hence continuous). 

Clearly feasibility (I6])-(IH]) implies v > v := max{ni„, Um, Wt}- When equal¬ 
ity holds — as we shall see that it does (Theorem IT^ for some v minimizing 
oa — this implies strong conclusions. For example, and Vm inherit Lips- 
chitz and convexity properties from 6^ by an envelope argument (Lemma [2]), 
which V also inherits wherever it coincides with or Vm- Something similar is 
true but more subtle to verify for Vt (and hence for v) — because of the recur¬ 
sive structure built into the educational pyramid; in flT^ . as opposed to flTT]) - 
fll4p . this is manifested in the fact that the k dependence in the argument of 
the supremum involves the unknown function v. As another example, when 
N'd' and cNd are large enough. Proposition [7] derives complete specialization 
of types into low (workers), medium (managers), and high (teachers). This at 
least tells us the role of K-a.e. agent, leaving the distribution k = + 

of adults as the only unknown. Here Ky^ = A^, /N' and Kt = /N 

are measures representing the distribution of worker, manager, and teacher 
types, and have respective masses Kw[K] = , Km[K] = and 

Kt[K] = ^. If c = 0 but min{A^'6*', N9} > ^^e same proposition yields 

more subtle conclusions. 

A hrst insight into the educational pyramid is provided by the following 
example. 


( 13 ) 

( 14 ) 

( 15 ) 
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Example 1 (Gurus) Fix the number of students each teacher can teach or 
the number of workers each manager can manage to be N = N' = It). If 
our probability measure k represents the skill distribution for a population of 
110 adults, 90 of them will be workers, managed by 9 managers, and 11 of 
them will be teachers. Nine of these 11 = 9 + 1 + 1 will specialize in teaching 
workers, one in teaching teachers, and one in teaching a combination of 9 
managers and 1 teacher. We may remember this with the mnemonic 110 = 
90 + 9 + (9 + l + l). On the other hand, if k represents the skill distribution for 
a population of 11000 = 9000 + 900 + (900 + 90 + (90 + 9+(9+ 1 + 1)) adults, 
9000 of them will be workers, managed by 900 managers, while 1100 of them 
will be teachers. Of these, 900 will teach workers, 90 will teach managers, 
and 110 will teach teachers. Within these 110, there is further specialization 
as before: 90 will teach teachers who teach workers, 9 will teach teachers who 
teach managers, and 11 will teach teachers who teach teachers. Within this 
11, 9 teach worker-teacher-teachers, 1 teaches manager-teacher-teachers, and 
1 teaches only teacher-teacher-teachers. These last two may be thought of as 
‘gurus’. One of the guestions at stake is whether the salaries of these gurus 
can grow without bounds as the population size grows. 

Next we recall without proof a well-known result which can be proved as 

in [7]: 

Lemma 2 (Upper envelopes inherit derivative bounds) Iff-.AxK- 

R is locally Lipschitz in a & A, uniformly ink E K, then g{a) = sup^g^ /(a, k) 
is locally Lipschitz and for each S > 0 we have the bounds 

inf fa{a',k) < g'{a) < sup fa{a',k) 

keK,\a'-a\<S k£K.\a'-a\<S 

in the pointwise a.e. senses. Similarly, if f is locally semiconvex in a E A, 
uniformly in k E K, then g{a) is locally semiconvex and obeys the bound 

g"{a)> inf faa{a',k) 
k£K,\a'—a\<S 

in the same senses. Here fa ■= and faa '■= 

If f{a', •) extends upper semicontinuously to k for some a' E A, al¬ 
lowing f{a', k) = —cxo as a possible value, there exists k' E K such that 
g{a') = f{a',k'); if g{a) is differentiable at a' E]a,d[, the envelope theorem 
then yields g\a') = fa{.o!,k') provided f{-,k') is locally semiconvex near a'; 
similarly, g'\a') > faa{.o!,k') provided both functions admit a second order 
Taylor expansion with respect to a at a'. 
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Definition 3 (Supermodular) Given intervals /, J C R, a function f : 
I X J —^ R is weakly supermodular if 

/(a, k) + /(o', k') > /(a, k') + f{a\ k) (17) 

for all 1 < a < a' E I and 1 < k < k' E J. It is strictly supermodular if, on 
the same domain, the inequality mm remains strict. 

Remark 4 (Supermodular extensions) It is elementary to check that a 
function f which is weakly (or strictly) supermodular on A x K and has 
an upper semicontinuous extension to A x K that is continuous and real¬ 
valued except perhaps at {d,k), is weakly (respectively strictly) supermodular 
on Ax K. 

Throughout we assume 9, O', N, N' and d = k are positive parameters with 
ma.x{0,9'} < 1 < iV, and set c > 0 and A = [0,a[= K. Unless otherwise 
noted, the utilities Ie, G C^{K) of education and labor have positive lower 
bounds Ue/l H'eie on their first two derivatives ([I])-([2]), hence are strictly 
convex and increasing. 

Lemma 5 (Structure of wage functions) Let v : K —?■ R he convex 
non-decreasing, with v{k) > limsup^_,.^n(/c). Then f{a,k) = v{z{a,k)) will 
be weakly supermodular on Ax K, and strictly supermodular unless the con¬ 
vexity of V fails to he strict. 

Set z{a, k) = {1— 9)a + Ok, hg = hE°z and h'gfk', k) = & l ((1 — 0')k' + O'k) 
where bE/x G C^{K) satisfy (fT])-(]3|). Then the student payoff u defined by 
fl35|) is also convex non-decreasing on K and satisfies > cl/^ + mikv'{k) 
and > cfeg + mfkv"{k) pointwise a.e. 

The worker, manager, and teacher wage functions defined by (IT^ - 

(TO and their maximum v := maxfvw, Vm, Vt} are then monotone and convex 
on K, real-valued on K, and satisfy v' > min{(l — O')!/^^, N'0'1/e, N0{cI/e + 
mikv'{k))} andv" > min{(l —6*')^6'/,, (6'')^iV'fe'^, iV6'^(c6g + inffc n"(fc))} point- 
wise a.e. 

Proof. First note that convexity and monotonicity imply v is continuous 
throughout K = [0, fc[. Any convex v ^ can be approximated by convex 
Vi E locally uniformly on ]0, k[, with n' —)■ v' pointwise a.e. (and n" —)■ v" 
weakly). 
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Now let f{a,k) = cbE{z{a, k)) + v{z{a,k)). For each fixed fc, we see / 
is convex non-decreasing as a function of a E A, so the same must be true 
of the supremum u{a) = sup^jg^/(a,/c) — v{k)/N. Supposing for simplicity 
that V and bE are from 

fa{a, k) = {cb'E{z{a, k)) -f v'{z{a, k))) Za{a, k) 
and 0 < z{a,k) = {1 — 9)a + 9k we compute bounds 

c6g -|- inf v' < J < cb'^{z{a, k)) + v'{z{a, k)) 

1 — U 


and 


fl-ey ^ cb'i^izia^k)) +v''{z{a,k)) 

> cI/e + inf v" 

which are uniform in k E K. The analogous bounds for u follow from 
Lemma El 

So far, we have been working under the assumption that v and bE are 
C^{K). More generally, v and bE can be approximated uniformly on compact 
subsets of K by functions n* and 6^ satisfying the same hypotheses as 
V and bE- As a result, fAa,k) := b'^^{z{a,k)) + vAz{a,k)) converges to / 
uniformly on compact subsets of \{(a,k)}, and M*(a) := sup^g^ “ 

jfV{k) converges uniformly to u on compact subsets of A. Thus u inherits 
the same Lipschitz and local semiconvexity bounds as u* in the distributional 
(and hence pointwise a.e.) sense. See fl3^ for the distributional definition of 
the inequality v" > g. 

On the other hand, /(a, k] 9) = f{k, a] 1 — 9) is symmetrical, and Vt{k)/N 
is defined by essentially the same formula as M(a), but with the roles of a -H- A; 
and 9 EE 1 — 9 interchanged. Thus Vt is also locally Lipschitz and convex on 
A, and satisfies v[ > N9{cl/^ + infbu'(&)) and v” > N9 '^{cI/e + inffeu"(6)). 

Turning to and Vm, we apply Lemma El but with /(a, k) := bg,{o>, k) = 
— 9')a + 9'k), which is jointly convex and increasing in each variable. 
Approximating bi by C‘^{K) functions if necessary, shows bounds 

>Il = Ua*) < 4^ = - »')“ + «'*) < U((l - ff)a- + ffk) < 
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and 


/aa(fl;^) ,// / // 7 \\ \ t// 

(l_g/)2 K^)) >k 

are inherited by the convex increasing functions v^, and -^Vm on K. Thus v = 
niax{n^, Urn, t’t} is convex, non-decreasing, locally Lipschitz and inherits the 
bounds v' > min ifl — N'6'1/];^, N6{dl^ -|- infbn'(6))} and v” > niin{(l — 

O'fUi N\e'fUi + infbn"(6))} on K. 

Finally, setting f{a,k) = v{z{a,k)), using convexity of n G C^{K) we 
compute 


/(ao,/co) + fiai,ki)-f{ao,ki)-f{ai,ko) 

= {1-6)6 r r n"((l - 6)a + 6k)dadk 

Jao Jko 

> 0 


for oq < tti and ko < ki. For v ^ C^, the same formulas hold by smooth 
approximation of n = limn*. Strict inequality holds unless v" = 0 throughout 
]z{aQ,kQ), z{ai,ki)[. This yields the (strict) supermodularity flTT)) asserted. 


We are now in a position to prove our first main result, which describes 
how occupations are allocated according to cognitive skill. It depends on 
the relative size of various parameters: the teaching capacity N (resp. N') 
and effectiveness 6 (resp. 6') of teachers (resp. managers) in the population 
in question, the range k of cognitive skills, and the relative utility c > 0 of 
cognitive achievement compared to wages. When N'6' and cN6 are large 
enough it turns out that there is a complete ordering (a)-(b) of skill types 
between workers, managers, and teachers in a steady-state economy. However 
N6 > 1 is enough to ensure that no student studies with a teacher whose 
cognitive skills are inferior to their own (d), while N'6' and N6 large enough 
guarantee that the most cognitively skilled types all become teachers (c) 
(though not that all teachers have high cognitive skills). This conclusion will 
help US to establish the phase transition from bounded to unbounded wage 
gradients that these teachers enjoy as N6 passes through 1 (in section [2TB]) . 
The possibility (f) that the number d{k) of types of academic descendants a 
teacher can have may grow without bound ask ^ k foreshadows the analysis 
there. 

Remark 6 Note that in the following proposition, (c) and (d) together imply 
(e), meaning at least one of the two inegualities N6 >1 or c > D is strict. 
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Also note N'9' > and N9 > are sujficient for (b) and (c), 

respectively. 

Proposition 7 (Specialization by type; the educational pyramid) Fix 

K = [0, k[ with k > 0, and c > 0. Suppose u,v : K —)■ R are convex, non¬ 
decreasing, and satisfy v = ma.x{vw,Vm,Vt}. 

If (a) N9dl^ > hfmayi{N'9', 1 — 9'} then all teacher types lie weakly above 
all of the manager and worker types. 

If (b) N'9' > (1 - 9') supfcei^ ^l(1 -9')k + 9'k-)/h'^{9'k+) then all of the 
worker types lie weakly below all of the manager types. 

If (c) N9 > supo<^<fc&i((l - 9')z- + 9'k)/{b'^{9'z+) + jf^b'E{z+)) and 
(b) holds, and f{a,k) := u{a) + j^v^k) — cbE{z{a, k)) — v{z{a,k)) vanishes 
at some {a,k) G K x K where v{z{a,k)) = k)), then v > Vm on 

]k,k]. In other words, no manager (or worker) can have a type higher than 
a teacher of managers. 

If (d) N9 > 1, then any student of type a E K will be weakly less skilled 
than his teacher, and strictly less skilled if (e) either c > 0 or N9 > 1 in 
addition. 

If (f) either c > 0 orv'{0~^) > 0, then (d)-(e) imply all academic descen¬ 
dants of a teacher with skill k E K will display one of at most finitely many 
d = d{k) distinct skill types, unless differentiability of v fails at k. However, 
d(k) may diverge as k ^ k, in which case v'(k) +oo at a rate related to 
d{k) by m- 

Proof. Lemma O asserts convexity of Vwim.it-, hence one-sided differentia¬ 
bility everywhere, and two sided-differentiability except perhaps at countably 
many points. At points k g]0, k[ of differentiability, Lemma [2] (the envelope 
theorem) allow us to estimate the wage gradients 


v'wik) = 

{l-9')b'^{z'{k,k'J) 

e(i 


(18) 

'V'mik) = 

N'9'b'^{z'{k'w,k)) 

G 

N'9'WM 

(19) 

v't {k) = 

N9{cb'^{z{a, k)) + v'{z{a, k))) 

> 

N9cb'E{9k) 

(20) 


where /c^, k'^ and a are the respective points at which the suprema 
(or their extension to K) are attained. Such points exist in K according to 
the same lemma; we can extend v {k) = v {k~} '■= lim;j.|.fcr(/c) and u [k) E 
R U {-l-cxo} similarly without changing Vwimjt- Consideration of the worst- 
case scenario = 0 and k'^ = km. ffT8|) - flT^ shows if (b) holds that v'.^{k) < 


20 


v'^{k) at each point k where both derivatives are dehned. Then the locally 
Lipschitz function Vm — is strictly increasing. Since this function is non¬ 
positive on {/c I h = Uio} and non-negative on {k' \ v = Vm}, the hrst set 
must he entirely to the left of the second, as desired. 

Estimating the wage gradient for a teacher of type ko = k & K is more 
subtle, due to the recursive nature of formula fl2U]) . Since the student of 
ability ai = a taught by ko winds up with cognitive skill ki = {l — 9)ai+9ko = 
z{ai, ko), we hnd 

v[{ki) = N9{cb'j^{ki+i)+v'{ki+i)) (21) 

for i = 0, assuming differentiability of Vt at k^. Differentiability of v and 
at fci (and also of Vt < v) follows from convexity, since replacing k by ko 
produces equality in u{ai) -|- ^Vt{k) — v{z{ai,k)) — chE{z{k,ai)) > 0: the 
hrst-order condition 

{v'+ ch'E){z{ai,ko)~)> j^v[{ko)/Zk{ai,ko) > {v’+ ch'E){z{ai,koy) 

forces the one-sided derivatives {v'+ch'^){k^) < {v'+ch'^){k'^) to agree. From 
fl^ we have v[{ko) > N9c^^, which dominates (1 —9')b']^ and N'9’b'j^ in case 
(a). Since Uiu/m/t are locally Lipschitz, monotonicity of v'{k) then combines 
with the estimates ffT8|) - flT^ already established to show all teacher types ko 
are at least as high as the highest worker and manager types. 

From (d) N9 > 1 and (|^ we conclude v'{ki) < v[{ko), and this inequality 
is strict if (e) also holds, in which case every student studies with a teacher 
more skilled than himself, or — what is equivalent in our model — no student 
(except the very top type a = a) becomes as skilled as his teacher. 

Next, assume as in case (c), that a teacher of type k & K teaches a 
student of type a who becomes a manager of type z = z{a, k). Since v >Vm 
with equality at z, we have v\z^) > v'^{z^). Analogously to (lT^ - fl2U]) we 
hnd 

-^v't{k+) > c6'e(z+)+ u(„(z+) 

> cb'^iz-^) + N'9'b'^{{l - 9')k^ + 9'z-^), 

> c 6 ' e (^+) + N'9'b'L{9'z^) 

with equality holding in the hrst two estimates if all the derivatives in ques¬ 
tion exist. On the other hand, 

v'„(k) < AfWJ(l - «>-+ 


21 


since (b) implies the worker types all lie below the manager type Hypothe¬ 
sis (c) now yields v[{k'^) > v'^ik), and the convexity of Vt and strict convexity 
of Vm shown as in Lemma [5] then imply v'^ > v'^ on ]k,k[. Vanishing of the 
non-negative function / at (a,/c) implies Vt{k) = v{k) > Vm{k), whence the 
desired conclusion vt > Vm follows on ]k, k] by integration. 

Case (f) is more delicate, and our conclusions for it are more involved. If 
the student Oi above elects to become a worker or manager, we can estimate 
CO using (lT 8 |) - flT^ . However, if the student becomes a teacher whose stu¬ 
dents’ innate ability 02 allows them to acquire human capital ^2 = ^( 02 ,^ 1 ), 
we must iterate (jH]). And if these students in turn become teachers teaching 
students of ability 03 to acquire human capital k^ = ^( 03 , /C 2 ), we must iter¬ 
ate again, and continue iterating until the student of ability who acquires 
human capital kd = z{ad, kd-i) elects to become a worker or manager instead 
of another teacher. Assuming (d)-(f), we claim this occurs for some finite 
d\ otherwise the skills /cj+i < ki converge to some /coo £ K, for which the 
limit of fl^ yields an identity — l)n'(fc+) = cb'^{k^) equating quan¬ 
tities with different signs. Recalling v'{k'^) > 0 and c > 0, hypothesis (f) 
asserts at least one of these inequalities is strict, while (d) asserts N9 > 1. 
Unless NO = 1 and c = 0, this contradicts the limiting identity. But NO = 1 
and c = 0 contradicts (e). Thus the sequence ki terminates at some hnite d 
(which depends on ko). 

At this point we have 



where we have summed the geometric series and estimated kd > O^k^. ■ 

2.3 Characterization of optimality 

When we turn to the question of existence of optimal payoffs (u, v) for the 
linear program fllOl) . our strategy will be to perform the minimization under 
the additional assumption that u and v are convex non-decreasing, and then 
to show these additional constraints are non-binding at the optimum, thus 
have no effect on the outcome. Convexity and monotonicity provide the 
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requisite compactness for extracting limits from minimizing sequences. In 
order to show these constraints are non-binding however, it is necessary to 
control the payoff u{a) on the full interval A = [0, a[, and not only on spt a. 
Similarly it is necessary to control v on the full interval K = A, and not 
only on the support of the unknown distribution k of adult skills. Since the 
original problem is largely insensitive to the values of u and v outside spt a 
and spt K, we introduce a perturbed version of the problem to provide this 
control: for each 5 > 0 set 

LP{6)* := inf 6{u + v)a+ / u{a)a{da) (23) 

where {v)a ■= m(A) denotes the Lebesgue average of v over A. Here 

Fs = Fq denotes the same feasible set as before, with a subscript denoting 
only the possible dependence of the constant c = q in (|6]) on 5 > 0. Also u 
(and hence v) G L^{A,a), and if 5 > 0 then G L^{A,F[^). We must hrst 
solve the perturbed problem fl2^ and then extract the 5 —?■ 0 limit. For the 
latter endeavor and to characterize the optimizers, it will be crucial to know 
LP{6)^ is in fact dual to 

LP{5)* := _ max cse{hE o z) + \{bL o z') (24) 

e,A>0 on AxK satisfying Il25t-ll26t 

where 

£'=a + Aff‘U (25) 

and 

Let us begin by verifying LP{6)* < LP{5)^. This would be standard if the 
primal inhmum were restricted to continuous bounded functions u, u G C{A), 
as in Appendix where the reverse inequality and attainment of the dual 
maximum are verihed. However, a priori we know only that u, v differ from 
continuous bounded functions by non-decreasing functions, and even a pos¬ 
teriori we do not know whether or not minimizers of flTOl) or fl23|) are bounded 
at k. We have only the conditional result of Theorem (TB] to suggest that they 
are. Thus we are forced to work in a space which includes unbounded func¬ 
tions, and to check their inclusion does not spoil the otherwise elementary 
duality inequality LP{5)* < LP{5)^. 
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Proposition 8 (Easy direction of duality for unbounded functions) 

Fix 6, cs non-negative and 9, 6', N, N', d = k positive with max{6*, 6^'} < 1 < 
N. Let a be a Borel probability measure on A, where A = [0,a[= K, and 
define z{a, k) = {1 — 6)a + 6k, bg = bE o z and b'gfia, k) = &l((1 — 6')a + 9'k) 
where bE/i ^ C^{K). If Borel measures (e, A) and Borel functions {u,v) G 
are feasible for the primal and dual problems fl25]) -fl21 |) . with u G L^{A,a) 
and uS, vS G L^{A, H^), then a{u) -\- S{u v )a > cse(bg) + A(6g,) provided v G 
L^{A,z^e). If a satisfies the doubling condition (fT^ . then v G L^{A, z#e). 

Proof. Taking feasible pairs (e, A) of measures and {u, v) G Fg of func¬ 
tions with u G L^{A,a) and u6,v6 G L^{A, H^), the stability constraint for 
the education sector implies 

u{a) - csbE{z{a, k)) > v{z{a, k)) - j^v{k), (27) 

on Ax K, and the left hand side is in L^{A^, e). Thus 

-h oo > a{u) — C5e{hg) + 5(u + v)a 

> {5v)k+ I [v{z{a,k)) — j^v{k)]e{da,dk) (28) 

JAxK 

since = a -|- j^^H^\a- On the other hand, the steady state constraint 
z^e -f 1^77^|x = A^ -|- ]^A^ -I- combines with the stability constraint 
v{a) + -^v{k) > bg,{a, k) for the labor sector to imply 

{5v)k+ [ vd{z#e-j^e^) = j vd{X^ + ^\^) 

J K JK 

> [ h'g,d\ 

JaxK 

> 0 . 

Now if n G L^(A, z^e) we can equate the right hand side of fl28|) with the left 
hand side of fl2^ to obtain the hrst stated conclusion. 

We must still show that the doubling flT^ of a at a implies 0 < n G 
L^{A, z^e). Recall that {u,v) = (uq + Ui,Vo + fi) with MqWo ^ C{A) and 
Ui,Vi ■. A —> [0, cxo] non-decreasing (in fact strictly increasing without loss of 
generality). Since Uq is bounded there is no question about its integrability. 
We shall use v <u from ([H]) and u G L^{A,a) to deduce vi G L^{A,k) for 
K, := z^e. Since Vi is strictly increasing, vf^{y) G R U {±oo} can be dehned 


(29) 

(30) 
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unambiguously. Lemma dU the doubling condition and the layer-cake 
representation m of the Lebesgue integral imply 


Vi{k)K{dk) 


IK 



(31) 


for some C < oo. On the other hand, vi < uq + ui — vq < ui + const yields 
— const) < so 


«[«! {y),a]dy 


j Ui{a)a{da) < -|-oo 


implies hniteness of fl^ and completes the proof that Vi G L^{K, n). m 


Corollary 9 (Characterizations of optimality) Fix6,cs,9,9',N,N',z,he,bg, 
and k = d as in Proposition [3 and a Borel probability measure a on A sat¬ 
isfying ma, where A = [0,a[= K and a G spta. A pair of feasible mea¬ 
sures e, A > 0 on maximizes the dual problem (|24]) if there exist feasible 
{u,v) G Fs such that a{u) -f 5{u + v)a = cse(bg) \{bg,). 

Conversely, {u, v) G Fs minimize the primal problem if and only if there 
exist e, A > 0 feasible for the dual problem such that a{u) -\- S{u -\- v)a = 
csAbe) + X%')- 

For feasible pairs in the given spaces, a{u) -\- {u -\- v)a = cse{hg) + Xibg,) 
is eguivalent to the assertions e(/) = 0 = X{g) where f{a, k) = u{a) -1- — 

csbg{a, k) —v{z{a, k)) > 0 and g{k', k) = v{k') -\- — bg,{k', k) > 0 on Ax K. 

Proof. Let (e, A) be a pair of feasible measures for the dual problem, 
and {u,v) G Fs so that u G L^{A,a) and u6,v6 G L^{A,H^) and f,g > 0 
when defined as above. Then Proposition [H] asserts v G L^{A,z#e) and 
csAbe) + KK') ^ LP{6)* < LP((5)* < a{u) 5{u + v)a- If cse{bg) -f X{b'g,) = 
a{u) + S{u -\- v)a this forces this chain of inequalities to become equalities, 
showing (e. A) and (u, v) to optimize their respective problems. 

The converse is proved using the result LP{6)* = LP{6)^, which follows 
by combining the same proposition with Theorem [T8l Suppose a{u) -|- 6{u + 
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v)a = LP{5)^, meaning {u,v) is optimal. Lemma [T7I provides (e, A) such that 
cs^{bg) + A(6g,) = LP(6)*. 

Finally, we claim that cse{bg) + X{b'g,) = a{u) + S{u + v)a is equivalent to 
e(/) = 0 = X{g)- This follows from the chain of inequalities which establish 
ce{bg) + Xib'g,) > a{u) in Proposition [H) e(/) = 0 is equivalent to equality in 
fl25]) . \{g) = 0 is equivalent to equality in flSU]) . and when both of these hold 
then cse{bg) + \{b'g,) = a{u) + 6{u + v)a- ■ 

Remark 10 (Converse) According to Theorems fidl and {TR. the sujficient 
condition for optimality of (e, A) given by Corollary\^ is also necessary. 

2.4 Optimal wages for the primal problem 

Using the foundations laid in the previous sections, we are ready to demon¬ 
strate the existence of optimal wages v{k) and payoffs u{a) for the primal 
problem flTOj) . This is done using a compactness and (lower semi-)continuity 
argument for the perturbed problem ([23]), and then taking the limit 5 —?■ 0. 
For 5 > 0, we assume v is convex nondecreasing, and then use Lemma [5] 
and the characterization v = max{u^, ut} — which identihes the wage of 
an ability k adult with the maximum he can earn as a worker, manager or 
teacher — to show the convexity and monotonicity assumptions on v do not 
bind, so play no role in the outcome of our (inhnite-dimensional) linear pro¬ 
gram. Thus convexity of the wages in our model emerges for reasons which 
manifest rather differently than in Rosen’s investigation of superstars [El- 
Compactness for convex non-decreasing v is asserted in the following 
lemma. Some delicacy is required to show that if u or u diverges to -|-cxo, 
then both do so on the same half-open interval, and at a uniform rate. 

Lemma 11 (Compactness for wage functions) Fix K = [0, and g G 

L]oc{K). A sequence Vi : K —)■ [0,cxo[ of convex non-decreasing functions 
satisfying v'l{k) > g{k) a.e., admits a subsequence which converges pointwise 
to a limit vq : iC — > [0, oo] which is real valued on [0, fco[, and infinite on 
]ko, k[, for some ko G [0, k]. The convergence is uniform on compact subsets 
of [0,/co[; 0 ,'iT'd the analogous bound v'^ik) > g{k) holds a.e. on its interior. 
Furthermore, for a > fco 

Ui{a) := max cbE{z{a, k)) Vi{z{a, k)) — fjVi{k) 
ke[0,k[ 
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diverges to Uo{a) = +oo as i —)■ oo along the subseguence described above, 
where bE G C^{K) satisfies ([I])-©, c > 0 and z{a, k) = {1 — 9)a + 9k. 

Proof. The fundamental theorem of calculus yields 

fk' 

vfik') = Ui(0) + / v[{k)dk. (32) 

Jo 

Since 0 < v'fik) is non-decreasing for each i, Helly’s selection theorem pro¬ 
vides a subsequence converging to a non-decreasing limit Uo(fc) on K, ex¬ 
cept possibly at discontinuities of Vq in ]0, fc[. Choose a further subsequence 
for which uo(0) := limj_;.oo'i^iO)(0) converges; unless such a sequence exists, 
Uo(0) = +00 and the lemma follows immediately with ko = 0. Therefore 
assume Uo(0) < oo and choose ko G [0, k] so that v'^^k) < oo for k < ko and 
Uq(/c) = oo ioT k > ko- For k' < ko, Lebesgue’s dominated convergence theo¬ 
rem allows us to take i(j) —)■ oo in (l32|) . to obtain a continuous limit vo{k') 
on [0,/i;o[. It follows that —)> vo uniformly on compact subsets of [0,/i;o[. 

Monotonicity of v[ ensures va^jfik) —)■ oo for each k > ko- For u,, g G the 
inequality v'J > g holds in the distributional sense — meaning 

[f"{k)vi{k) - f{k)g{k)]dk > 0 (33) 

for each smooth compactly supported test function 0 < / G ^“(jO,^)) — 
if and only if it holds in the a.e. sense. Thus v'J > g distributionally, and 
the bound Vq > g follows on ]0, /co[, using Lebesgue’s dominated convergence 
theorem again. Taking g = 0 shows Vq is convex on ]0, ko[, for example. 

Now if a > ko, taking k = ko implies ko < z{a, fc) = (1 — 9)a + 9k, thus 
Ui{j){a) > cvi(^jfiz{a, ko)) diverges to -|-cxo as j ^ oo. ■ 

Corollary 12 (Convergence nniform from below) Suppose a seguence 
Vi : [0,fc [—)■ [0, oo[ of functions satisfying the hypotheses of Lemma [771 con- 
verges pointwise to Uq ^ [0, —> [0, C)o] which is real valued on [0,/co[; 

infinite on ]fco, k[ for some ko G [0, k]. If Voikff) := lim^-i-feg Voik) < -|-oo then 

0 < liminf inf Vi{k)—vo{k). (34) 

i^oa ke[0,ko[ 

On the other hand, if vo{ko) = +cxo then the seguence grows uniformly in 
the sense that for each c < oo taking i' < oo large enough implies vfik) > c 
for all k > ko — I/i' and i > i'. 
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Proof. Given 5 > 0, taking ki < sufficiently large makes no(A;i) > 
'^ 0(^0 ) “ ^/2- Taking i sufficiently large then ensures Vi{ki) > no(^o ) “ 
whence for all k G [fci,/co[ monotonicity yields Vi{k) > voik) — S. Since the 
convergence Vi vo is uniform on [0, ki], this concludes the corollary in case 
no(/^o ) < +c» is finite. 

If no(^o ) = + 00 , given c < 00 take i' sufficiently large that Vo{ko — l/i') > 
c and then larger still to ensure Vi{ko — 1/i') > c for all i > i!. Monotonicity 
again concludes the proof. ■ 

Theorem 13 (Existence of minimizing wages) Fix c > 0 and positive 
9, 9', N, N' and d = k with iaax{9, 0'} < 1 < N. Set A = [0, d[= K and let 
a be a Borel probability measure on A satisfying the doubling condition flT^ 
at d E spta. Define z{a,k) = (1 — 9)a + 9k, bg = bE o z and bg,{a,k) = 
6l((1 — 9')a + 9'k), where bE/r ^ C^{K) satisfy ([I])-(IS])- Then infimum fITU]) 
is attained by functions {u,v) satisfying v = max{vw,Vm,Vt} on K = [0, fc] 
and 

u{a) = sup cbE{z{a, k)) + v{z{a, k)) — j^v{k) (35) 

k&K 

on A, where the Vwimit o,re defined by (IT^ - (lT6l) ; here u,v : A —)-]0, cx)] 
are continuous, convex, non-decreasing, and — except perhaps at d — real¬ 
valued. For j G {1,2}, ifN9^ > 1 thend^v/dy > mm{{l — 9'y, {d'fi N'}. 

Proof. Fix 0 < (5 < 1 and cs := c > 0 positive; if we prefer c = 0 set 
C 5 = 5 in the 5 —)■ 0 limit procedure which follows. We are going to study the 
perturbed primal problem fl23l) under the same feasibility constraints (I6l) -fl8l) 
as flTOl) — which include u G Lf{A,Q) — plus the artificial constraint that 
V be convex nondecreasing. From ([8]), both u and v G L^{A,q) and have 
positive lower bounds. For <5 > 0 we assume u,v E L^{A,H^) without loss 
of generality, since otherwise the term {u + v)a = +00 makes the objective 
diverge. Feasibility of the pair {u,v) = (1 + csbE/bLA)bL yields an upper 
bound (1+2(5)(c^fe^j+fei,) for the infimum fl2^ . As remarked after ([H]), we may 
always replace u and v by their lower semi-continuous hulls without violating 
feasibility. Since a > 0, this only improves the objective fl23|) : for the same 
reason, it costs no generality to henceforth suppose u to be related to v by 
fl35ll . Lemma [5] then implies both v and u are convex and non-decreasing, 
hence continuous as extended real-valued functions. 

Lemma [m allows us to extract a subsequential limit {u 5 ,vs) satisfying 
the same constraints from any sequence of approximate minimizers for fl2^ . 
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Fatou’s lemma ensures the limit {us,vs) minimizes the objective subject to 
these constraints. Replacing the monotone convex functions us and vs again 
by their lower-semicontinuous hulls ensures both are continuous. Since a G 
spt a, our a priori bound (1 + 26){csbE + hi) on the objective implies the 
non-decreasing functions us{a) and v^ik) are hnite, except possibly at a and 
k, and 

I us{a)a{da) < (1 + 2S){csbE + 6l). (36) 

JA 

Notice equality must hold in 

U5{a)> cshE{z{a,k)) + V5{z{a,k)) - j^vs{k) (37) 

A:G[0,fc] 

since otherwise replacing us by the right-hand side of (l37D yields a feasi¬ 
ble pair which lowers the objective functional, contradicting the asserted 
optimality. Use {u,v) = {us,vs) to dehne := {vw^Vm^Vt) and 

Vs := max{n^,nm,nj. 

Feasibility implies vs > vs, and Lemma [5] implies vs is continuous on K, 
convex increasing on K, and satishes 

v's > min{(l — 6'')6^, N'6'l/i^, {cs!/e + ^^^1 (38) 

< > mm{{l-e')X,{e'YN%,{cs!/;, + miv'^)Ne^} (39) 

on ]0, k[. If T] := Vs — vs is positive somewhere, it is positive on an interval 
where the only binding constraints can be = 0 or v” = 0. For small A > 0, 
the perturbation v^ := (1 — A)^^ -|- \vs respects these differential constraints. 
We will now show the pair {us,v^) respects the other constraints as well; 
unless the continuous function rj = 0 throughout K, this pair lowers the 
objective functional, a contradiction forcing vs = vs- 

Since v^ = vs — \rj = vs + {)- — X)r], for k',k E K we hnd 

v\k') + ’ZM-y^,(k',k) = vs{H) + "-S^-b',,{k,k) + (l-\)ri(k')-jiri(k) 

> ,(A.')|l-A(l + i^)l. (40) 

and also 

v^(k') + ^ZM-l,'^,(k',k) = Vi(k') + ’-§l-b',,(k',k)-\r,(k') + i=^r,(k) 

> 3ffi|l-A(l + i^)]. (41) 
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If both ri{k') > 0 and ri{k) > 0 are non-zero, then taking A < 1/2 ensures 
either 0401) or 04ip is positive. The same conclusion remains true if one of 
r]{k') or r]{k) vanishes. If both vanish, there is nothing to prove. 

On the other hand, adding us{a) — csbE{z{a, k)) to 

^ k)) = ^- Vi{z{a, k)) + ^'q^k) + X'q^zi.a, k)) 


shows 

“5(a) + - csbEiz{a, k)) - v^{z{a, k)) > ^rjik) + \r]{z{a, k)) > 0 

as desired, since > v\. This establishes vs = vs on K. At k, convexity 
implies upper seniicontinuity of vs and it is dominated by the continuous 
function vs, so identity vs = vs extends to K. 

As a consequence of (l38D - (l3^ , for > 0 both Vg and n/ are bounded 
away from zero so the constraints min{n', n"} > 0 are not binding. We claim 
{us, Vs) must also minimize the linear program fl23l) even among feasible pairs 
which do not satisfy these additional constraints. To see this, we’ll suppose 
the objective was lower at some other feasible pair (n, v) G Fq and derive a 
contradiction, li u,v G C'^(A), then the pair (1 — s){us,vs) + s{u,v) G Fq 
also lowers the objective for s > 0 sufficiently small, and inherits the strict 
convexity and monotonicity of {us, vs) to produce the desired contradiction. 
\i u,v ^ C'^(A), the same contradiction will be obtained after approximat¬ 
ing (u, v) by a smooth feasible pair. We can at least assume u and v are 
continuous and bounded according to the proof of Theorem [181 The Stone- 
Weierstrauss theorem then shows u and v can be approximated uniformly by 
smooth functions (ho-, v^) such that M-|-(j<ho-<M-|- 2(T and v < < v + a 

as cr —)■ O'*". In this case, (ho-, ho-) G Fq follows from {u,v) G Fq. Convergence 
of the objective function to its limiting value as a —)■ 0 is readily verified. 
This establishes the desired contradiction, hence the minimality of (us,vs) in 
Fq. 

Now Corollary [9] asserts there are non-negative measures > 0 and 
A5 > 0 satisfying the perturbed feasibility constraints (^^ such that 

a{u5) + 6{us + Vs) A = cses{bg) + Xs{be')- (42) 

Lemma [TT] yields a subsequential limit (n^., {uq, uq) pointwise on Ax iC 

and uniformly on compact subsets of [0, ao[x [0, fco[, with Uq^q) = -|-cxo for a > 
ttQ G [0, a] and VQ^k) = -|-cxo for k > kQ E [0, k] and Qq < /cq. We claim qq = a. 


30 





Recalling the monotonicity of us, if Oq < a we have us^{k) —?■ +oo uniformly 
on a G [(oo + a)/2,a]. Since a G spt a, Fatou’s lemma will contradict the 
bound fl36|) unless qq = a. This also forces equality in k = a < ko < k. Thus 
(mojUo) are feasible for the original problem ffTOj) . 

Extracting a further subsequence if necessary, we may also assume (e^., A5J 
(eo, Ao) weak-* inC{Ax K)* as (5j —)■ 0 to feasible measures for the dual prob¬ 
lem flTT]) . (This compactness argument and topology are also described in 
the proof of Lemma fTTli Taking the 5 —?■ 0 limit of fl42l) . Fatou’s lemma 
combines with the weak-* convergence to give 

< coeo(&e) + ^ 

Proposition [S] yields the opposite inequality, and its corollary then conhrms 
the desired optimality of (uo,vo) (and of (eo, Aq)). 

Noting Vs = Vs, the inequalities (I38|) - fl3^ survive passage to the ^ 0 
limit in both the distributional (l3^ and a.e. senses. For j = 1 or j = 2, when 
> 1, these inequalities imply d^vs/dk^ > 6^^min{(l — 6'y,{0'yN'} 
throughout K before and hence after the limit. It remains to show the 
identity vs = vs survives the 5* —)■ 0 limit hrst on K, and eventually on K. 

Although we have only subsequential convergence of {us,vs), we abuse 
notation by writing 5 —?■ 0 to denote this subsequence hereafter. Taking 
5 —!■ 0 in the remaining identity of interest vs = vs yields 

fo := lim Vs = max{hmsup v^, limsup n™, limsup v^]. (43) 

^“^0 5_^o (5->0 (5—>-0 

Using k~ to denote the limit k"\ k^we claim UQ{k~) < cxo if no(^~) < 00, and 
uo{k~) = cxo if vo(k~) = 00 . The second claim follows from (|8]), which gives 
uo(ci) > the hrst claim is more subtle unless a has a Dirac mass 

at a, but follows from the boundedness of vq in the supremum (|35l) due to 
the following parenthetical paragraph. 

(To see that (l3^ continues to hold when 5 = 0 assuming a [{a}] = 0, 
consider the continuous function fs(a, k) := us(a) + jj^vs(k) — csb^izla, k)) — 
vs(z(a, k)) > 0 on A X K. The zero set Zs of fs is relatively closed in A fl i?; 
it is non-decreasing by the strict submodularity shown in Lemma 0 and 
contains (A x i?) fl spte^ according to Corollary [HI For each {as,ks) G Zs 
this monotonicity implies 

/ es{da,dk) < / es{da,dk). (44) 

J\a5^^'xK 
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Fixing as = a, to establish the limiting case of it is enough to show 
limsup^^o < k. Recalling that the left and right marginals of es are given 
by setting Aa = a — a and Aks = k — ks, from we deduce 

— {aa(]a — Aa,a]) + 6Aa) < a{z#es){[k — Aks,k]) + SAks 

< 6 Aks + {da + 6 H^\A){[a --^—^Aks,a]) 

where the second inequality follows from fl^6l) . Since a G spt a but a [{a}] = 0, 

the left hand side remains bounded away from zero in the limit 5 —?■ 0, whence 

we conclude hminf5_j.o Aks > 0 also. Thus (l35ll holds for a G A with 6 = 0.) 

Now if Vo{k~) < oo then Corollary \V2 \ allows us to deduce limsupn^ < Vq 

s^o 

for k E [0, /c[ from 

Vs(k) = N sup csbE{z{a, k)) + vs{z{a, k)) - us{a), (45) 

aG[0,a[ 

noting Uo{a) < liminf^^o'*^<5(0) uniformly on [0,a[ and z{a,k) is constrained 
to the range where the convergence vs -E- Vq is uniform. Showing lim sup vf < 

Vq and lim sup u™ < u™ is similar but simpler, whence vq < max{vQ , , Vq}. 

5—^0 

The opposite inequality follows from the constraints satished by {uq,Vq). 

If Vo(k~) = +CX0 on the other hand, then for hxed k G [0,fc[ let Cs 
denote the supremum of csbE{z{a, k)) +vs{z{a, k)) over a G [0, a[ and observe 
C5 —)■ Co < cxo as (5 —)■ 0. Take 5o > 0 sufficiently small that Cs^ < 2Co, 
and smaller if necessary using Corollary [12] so that us{d — 60 ) > 2Co for 
ah 6 < (5o- For 5 < (5o, the supremum fi45D is unchanged if we restrict its 
domain a G [0, a — (5o] to an interval where convergence {us, vs) {uq, Vq) is 
uniform. Thus taking 5 —?■ 0 in 0451) yields hmn^(/c) = vUk). A similar but 

S —^0 

simpler argument yields = hmr>^(/c) and v^{k) = hmn™(/c), whence 

(5—»0 (5^0 

the desired identity follows from fi43|) . 

It costs no generality to replace Uq by the right hand side of (I57|) with 
5 = 0, which is feasible and no larger than uq in any case. (In fact, they 
coincide throughout A by the parenthetical paragraph above.) Let us now 
argue that we may take uq to be continuous, or equivalently take equality to 
hold in vo(k~) < vo{k). If vo(k~) < vo{k), replacing vo{k) with vo{k~) does 
not violate any of the feasibility constraints. Nor does it affect the values of 
Vyj, Vmi Vt or Uq — except to remedy any discontinuity in Vt or uq by reducing 
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Vt(k) and Uo{a). This can only improve the objective, and by continuity of 
all of the resulting functions extends the identity Vq = ho from K — where 
it was already established — to K, to complete the proof. ■ 

2.5 Uniqueness and properties of optimal matchings 

Finally, we are ready to tackle the structure of optimal matchings in the 
education and labor sectors, and to give conditions guaranteeing uniqueness 
of optimizers for both the primal and dual problems ffT0|) - flTT]) . 

The structure our education sector often leads to positive assortative 
matching e of students with teachers. (Our labor sector always leads to 
positive assortative matching of workers to managers.) However, since dis¬ 
tribution K of cognitive skills acquired by adults in our population is endoge¬ 
nous, it might not be unique. The following theorem specihes conditions for 
uniqueness. These require, in particular, that k as well as the exogenous 
distribution of student skills a be atom free. The following lemma details 
how K inherits this and other useful properties from the distribution a of 
student skills input. Even without positive assortativity, unless the (exoge¬ 
nous) probability measure a concentrates positive mass at the top skill type 
a [{a}] > 0, it follows that k concentrates no mass at the upper endpoint 
of K = [0, fc[. Then any matching e > 0 on H x which satisfies the 
steady-state constraint < z^e must concentrate all of its mass on Ax K. 

Lemma 14 (Endogenous distribution of adult skills) Fix 6 g]0,1[ and 
a Borel measure a > 0 on A with Q![H] < cx) for A = [0, a[ with a > 0. Set 
K = [0, k[= A and z{a, k) = {1 — 9)a + 6k. If e > 0 on A x K has ot = A 
as its left marginal, then for each k — Ak G K the corresponding distribution 
n = z#e of adult skills satisfies 

f K{dk) < f a{da). (46) 

J[k—Ak,k] J[a—jfgAk,a] 

Thus K has no atom at k unless a has an atom at d. 

In addition, if e is positive assortative and a has no atoms, then k has 
no atoms and e = {id x kt)^a for some non-decreasing map kt : A —)■ K. 
uniguely determined a-a.e. by n. Moreover, if a {da) = a°‘'^{a)da is given by 
a density a°'^ G L^{A), then K{dk) = i<i°'^{k)dk is given by a related density 
^ac g Lf{K) satisfying 

a°'^{a) = (1 -I- 6{k[{a) — 1)) K°'^{z{a, kt{a))) (47) 
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for Lebesgue-a.e. a E A. In this case ||K“^||ioo(^) < -^\\a°'^\\L°°(A)- 

Proof. The definition k = 2;:^e yields K([fc —A/c, fc]) = e[z~^{\^ —Is.k,k])]. 
Now k — Ak < z{a, k) < {1 — 9)a + 9k implies a >k — Thus 

K{[k — Ak, k]) < e ([a — j^Ak, a] x K) = a([a — j^Ak, a]) 
which is the desired bound (1461) . 

For the measure e to be positive assortative means its support spt e is non¬ 
decreasing. Except possibly for a countable number of jump discontinuities, 
this support is then contained in the graph of some non-decreasing map 
kt A ^ K. If a is free of atoms, the countable set of a where the jumps 
occur is a set of measure zero. Then the formula e = {id x kt)#a and 
uniqueness of kt are well-known facts, established e.g. in Lemma 3.1 of [1] and 
the main theorem of [13]. It follows that /(a) = z{a, kt{a)) is non-decreasing, 
and pushes a forward to k. By Lebesgue’s theorem, f'{a) = 1 — 6* -|- 9k[{a) 
exists iL^-a.e. and enjoys the positive lower bound f'{a) >1 — 9. Thus / 
is one-to-one and there is an inverse function g : K —> A with Lipschitz 
constant at most such that g{f{k)) = k for any non-decreasing extension 
/ : K — > A of f (to points where kt{a) may not be differentiable). For 
K' C K we have k[K'] = a[f~^{K')] = a[g{K')]. Taking K' to consist of 
any single point shows k has no atoms if a has no atoms. Taking K' to be an 
arbitrary set of Lebesgue measure zero shows k absolutely continuous with 
respect to Lebesgue if a is absolutely continuous with respect to Lebesgue, 
noting H^[g{K')] < -A^H^[K']. The formula a‘^^{a) = /'(a)K“(/(a)) then 
follows essentially from the fundamental theorem of calculus, and is argued 
rigorously in [T3|. The bound ||K“||ioo(x) < y^||«“^||l°°(A) is a consequence, 
so the proof is complete. ■ 

Theorem 15 (Positive assortative and unique optimizers) Fix c > 0 

and positive 9, 9', N, N' and a with max{6*, 6^'} < 1 < N. Set A = [0, d[ and 
let a be a Borel probability measure on A satisfying the doubling condition 
(IT^ at a E spt a. Define z{a, k) = {1 — 9)a + 9k, bg = Ie o z and bg,{a, k) = 
6l((1 — 9')a + 9'k) where Ie/l ^ C^{K) satisfy (fT|)-()3|). If e,\ > 0 on 
A? maximize the dual problem mil, then the labor matching A is positive 
assortative. Moreover, there exist a pair of maximizers (e. A) for which the 
educational matching e is also positive assortative. 

If there exist minimizing payoffs {u, v) E Fq for the dual problem 00 
which are non-decreasing and strictly convex, (as for example if either c > 0 
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or ^ Ij; then any maximizing e and X are positive assortative. If, in 
addition, a is free from atoms then the maximizing e and A are unigue. If, in 
addition, hypotheses (d)-(f) from Proposition\^hold, thenu' andv' exist and 
are uniguely determined a-a.e. and {z^e)-a.e. respectively. If, in addition, 
a dominates some absolutely continuous measure whose support fills A, and 
{uo,Vo) G Fq is any other minimizer with vq : A —> R locally Lipschitz then 
Uq = u holds a-a.e., meaning Uq is unigue. 

Proof. Set K = [0,fc[= A. Existence of a maximizing pair (e, A) is 
asserted by Lemma [T71 Let us begin by showing that they are positive 
assortative under the extra condition that minimizing payoffs {u, v) exist for 
ffTOj) which are strictly convex. Lemma [5] asserts v{z{a,k)) is then strictly 
supermodular. 

Set /(a, k) = u{a) + — cbE{z{a, k)) — v{z(a, k)) > 0 and g{k', k) = 

v{k') + ^ — b'g,{k',k)) > 0 on Ax K, with the convention f{d,k) < 0 if 
v{k) = +00, and vanishing if and only if u{d) = +cxo in addition. Corollary [9] 
asserts e(/) = 0 and X{g) = 0 for any dual maximizers (e, A). Thus e and A 
must vanish outside the respective zero sets F G A x K of f and G C of 
9- 

When / and g are strictly submodular, then F and G are non-decreasing 
in the plane, meaning A and e are positive assortative. This strict submod¬ 
ularity follows from that of —bE{z{a, k)) and —v{z{a, k)). 

Finally, assume in addition that a is atom free. If (e*, Aj) are dual max¬ 
imizers, for i = 0,1, then so is their average (e2, A2) := (eo + ei, Aq + Ai)/2. 
Thus €2 vanishes outside the non-decreasing set F, as do eo/i. Similarly Aj 
all vanish outside the same non-decreasing set G for i = 0,1,2. This strongly 
suggests the asserted uniqueness, an intuition we now make precise. Except 
perhaps for a countable number of vertical segments, the non-decreasing set 
F is contained in the graph of a non-decreasing map kt : A — > K. Any 
joint measure e with = a cannot charge these vertical segments, since 
this would imply a has atoms. Since our maximizers e* vanishes outside the 
graph of kt, we conclude they must coincide with the measure {id x kt)#a by 
Lemma 3.1 of [1]. This identification shows eo = ei. The associated distri¬ 
butions K = z^Cq and Kt = {cqY/N of adult and teacher skills are therefore 
also unique. Moreover, k is free from atoms, according to Lemma [HI 

Let A^ and A^ be the left and right marginals of each maximizer Aj > 0 
for the labor sector, whose feasibility implies X] + X^N' = k — Kt is also 
atom-free. Let AA = Aq — Ai denote the difference of the two maximizers. 
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Recall that both A* — and hence AA — must vanish outside the same non¬ 
decreasing set G. Just as before, the non-decreasing set G has at most 
countably many horizontal and vertical segments, which Aj cannot charge 
since its marginals are free from atoms. Now the positive marginals AA^ : = 
((AA)+)^ = ((AA)^)+ and AA^ of the difference must have the same mass, 
since the atom-free condition precludes cancellations. On the other hand, 
feasibility implies iVAA^ — NAXl_ + AA^ — AA^ = 0, which forces iVAA^ = 
AX't (and NAXl_ = AA^). Since these two measures have the same mass, 
N ^ 1 produces a contradiction unless AA = 0. If A = 1 so that all adults 
are teachers, then A* = 0. This establishes the uniqueness asserted for the 
dual problem. 

Having established the existence of positive assortative maximizers when 
V is strictly convex, we now turn to the case that strict convexity fails. Ac¬ 
cording to Theorem [131 this happens only when c = 0 and N6‘^ < 1, so we 
can approximate this situation as a c —)■ 0 limit. Let (cc, Ac) and {uc,Vc) be 
the (non-negative) optimizers described above for the problem with c > 0, 
so that cecipe) + Xcih'g,) = a{uc) according to Remark [TUI Using the Banach- 
Alaoglu theorem as in the proof of Lemma [T71 and the compactness results of 
Lemma [TTl we extracting a subsequential limit (ec, Ac) —t (e. A) in the weak-* 
topology on U(A X K)* and (mc, Vc) {u, v) locally uniformly on [0, ao[, with 
u{a) = J-cxD = v(a) for all a > Oq. The limiting pairs are feasible for the pri¬ 
mal and dual problems respectively, and positive assortativity survives the 
limiting process [13]. Fatou’s lemma allows us to take the subsequential limit 
of cecipo) + Xcip'gi) = a{uc) to arrive at Xih'^,) > a{u). The reverse inequality 
is asserted by Proposition [HI and conhrms optimality of (e. A) by Corollary [9] 

We now address uniqueness of the primal minimizers. Since u and v 
are strictly convex, both are continuous functions with one-sided derivatives 
throughout K, and two-sided derivatives except perhaps at countably many 
points. Dehne u{a) = \mia^au{a) and v{k) similarly. Since the measures a 
and z^e have no atoms, the asserted derivatives of u and v exist. Denote 
the distribution of workers and managers by := tt^A and Km ■= tt'^X/NX 
The projections of spte through 7r^(a, k) = a and 7r^(a, k) = k are compact 
sets of full measure for and Km respectively. Take Domn' c]0,fc[ by 
convention. For each k' G 7r^(spt A) fl Domn', there is a unique k ^ K 
with {k',k) G spt A C G. The hrst-order condition gk>{k',k) = 0 then gives 
v'{k') = (1 — 6'')6'^((1 — 9)k' + 9k); by strict convexity of there cannot 
be two such k without differentiability of v failing at k'. This shows v' to 
be uniquely determined by A throughout 7r^(spt A) fl DomU — a set of full 
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Kw measure. A similar argument with the roles of k' and k interchanged 
shows v'{k) = — 9')k' + 9'k) is uniquely determined by A on the 

set 7r^(spt A) fl Domn' containing Km-Si.e. manager type k. 

To address v'{k) for the teacher types /c, assume hypotheses (d)-(f) of 
Proposition [71 For ki G spt Kt Pi Domn', that proposition provides a recur¬ 
sive formula fITT]) asserting ^2 G Domn', and relating v'{ki) to v'{k 2 ), where 
(oi, ki) G spt e and /c 2 = z{ai, ki) is the skill of those adults who were trained 
by type ki teachers. The strict monotonicity of v'{k) we have assumed im¬ 
plies oi and ^2 are unique. The proposition also asserts that after a hnite 
number d of iterations, this recursion terminates with an adult of skill kd who 
is willing to become a worker or a manager, and whose wage gradient v'{kd) 
is therefore determined by the considerations above. Thus v'{ki) is uniquely 
determined by e. A, and fl22|) . This establishes the K-a.e. uniqueness of the 
wage gradient v'. 

Finally, we turn to the net lifetime surplus u{a) of student type a g] 0, a[. 
For a G 7r^(spte) fl Domu', there exists k & K (which we’ll show to be 
unique) such that (a, k) G spt e <Z F. The hrst-order conditions for one-sided 
derivatives ±/a(a^, k) >0 give 

11^ (o 1 

v\z{a, k)~) + cb'^{z{a, k)~) > - -^ > v'(z(a, fc)+) cb'^{z{a, /c)+). 

However, the convexity of v on ]0, k[ assert v'{z~) < v\z~^) and similarly for 
bE, so both V and 6^ must be differentiable at z{a, k) and equalities hold 
throughout. Thus 

v'{z) + cb'E{z) = 

Since the left hand side is strictly increasing in z, we hnd z{a, k) and hence 
k is unique. Since v' was uniquely determined for z^e adult type, it follows 
that u' is uniquely determined for a-a.e. student type. If a dominates some 
absolutely continuous measure whose support hlls A, this shows u is unique 
up to an additive constant. Given another feasible minimizer (mqjI'o) with 
Vq locally Lipschitz, we see Uq must produce equality a-a.e. in the inequality 
(157)) : otherwise replacing uq by the right-hand side would remain feasible 
and lower the objective fiTOj) . On the other hand, the right hand side is 
locally Lipschitz, according to Lemma [2j The arguments above then yield 
Uo = u const. But the constant must vanish since both minimizers yield 
the same value for the objective functional, showing uq is unique in L^{A, a). 
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2.6 Phase transition to unbounded wage gradients 

Having come this far, one may wonder whether establishing the existence of 
competitive equilibria need be so involved. If we had been content to hnd 
optimizing wages u and v which are merely non-decreasing, an argument 
based on Helly’s selection theorem might have sufficed. However, we would 
not then know the convexity of the wages (used to prove their uniqueness), 
nor positive assortativity of the education sector. 

In this section, we explore the actual behavior of v{k) near the top skill 
type k, assuming the distribution of student types is given by a continuous 
density a{da) = a“^(a)(ia on A = K = [0, fc[. Under mild differentiability 
hypotheses, our next theorem establishes the existence of a phase transition 
separating bounded from unbounded wage gradients. For N9 > 1, it shows 
the education sector may form into a pyramid scheme in which the marginal 
wage v'{k) diverges to inhnity as k ^ k, even though the absolute wage 
v{k) remains bounded. For N9 7^ 1, it gives precise asymptotics fHHj) for 
the wage function v{k) and the endogenous distribution K°‘^{k) of adult skills 
near k. Notice this formula makes an explicit quantitative prediction for 
the dependence of the rate of divergence on the teaching capacity N and 
effectiveness 9 assumed in the model. In all cases this divergence is integrable, 
so the wages tend to a finite limit. For < 1 it predicts a specihc limiting 
slope v'{k) —)> c/(^ — 1) as /c —fc, while for N9 > 1 it predicts v'{k) —)> 00 at 
a specific rate. Thus the differences in marginal wages amongst the very top 
echelons of teachers (‘gurus’) is negligible in a thin (or equivalently, vertical) 
pyramid N9 < 1, but becomes more and more exaggerated if iV6* > 1, and 
at a rate which increases with N9, corresponding to a fatter and fatter (or 
equivalently, more and more horizontal) organizational structure with wider 
effective span of control. When the theorem applies, it also predicts that 
the density of adults (= teachers) at the highest skill level k = d tends to a 
constant multiple of the density of students. 

Theorem 16 (Wage behavior and density of top-skilled adults) Fix 

c > 0 and positive 9, 9', N, N' and d = k with max{9, 6^'} < 1 < N. Let a 
be given by a Borel probability density G L°°{A) which is continuous and 
positive at the upper endpoint of A = [0, d[. Set z{a, fc) = (1 — 9)a + 9k, 
be = bE o z and b'Q,{a, k) = &l((1 — 9')a + 9'k), where bE/L £ C^(K) satisfy 
0-0. Suppose (e. A) and convex (u, v) G Fq optimize the primal and dual 
problems (doD-dllD, and (i) k G (spt e^) \ spt(A^ -|- A^), meaning all adults 
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with sujficiently high skills become teachers; (ii) the educational matching e 
is positive assortative, meaning a non-decreasing correspondence k = kt{a) 
relates the ability ofa-a.e. student a to that of his teacher; (Hi) kt is differ¬ 
entiable at a, and (iv) v is differentiable on some interval ]k — 5,k[. Then 
for NO ^ I, 


v'{k) 


const 


_ log N9 

|/j _ /j| logiV 


cb'E 

_]_ 

N9 


+ 0 ( 1 ) 


(48) 


as k ^ k, and the steady state distribution k = z^e of adult skills satisfies 


K^^k) := lim- 
^ ' < 5^-0 5 



i-e/N 

i-e 




(49) 


Proof. As in Lemma [Ml hypothesis (ii) implies some non-decreasing 
function kt : A ^ K gives the equilibrium matching of students with teach¬ 
ers, so that kg{a) = (1 — 9)a -|- Okfia) gives the matching of student ability 
with human capital acquired when the student grows up. Then {kg)^a = n 
and {kt)^a = NKt, where n = Km + nm/N' -\- Kt/N gives the distribution of 
adult skill types on K, as a sum of the distributions of worker, manager and 
teacher skill types. Now 


Nk'fia)Kfi{kt{a)) = a“"(a), (50) 

and kg{a)K°‘'^{kg{a)) = (51) 

is known to hold for a.e. a E A. In particular, techniques of [H] can be used 
to show it holds at a = a provided kfia) (and hence k'g(a)) exists (iii) and 
are non-vanishing. On the other hand, the upper bound ||fi)“‘^||ioo < cxd from 
Lemma [TT] gives a positive lower bound for k'fia) near d a.e. in (150]) . which 
precludes the possibility that k'fid) = 0 . 

From (i) and the steady state constraint k = we have 

kt{d) = k = kgfd) and K“^(a) = Kf^{d). From fl50|) - fl5T|) we conclude Nk'fid) = 
k'gfd). On the other hand, differentiating kg{a) = (1 — 6)a + Okfia) yields 
k'gfd) = 1 -|- 6{k'fid) — 1) > 1 — 0. Solving this linear system of two equations 
in two unknowns gives k'fid) = and 


fl5T]) now implies (HO]) . 



i-e 

i-e/N' 


( 52 ) 
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Next we consider the equilibrium wage v{k) of each type of adult and 
payoff u{a) to each type of student. The stability constraint asserts u{a) + 
jfV{k) — v{z{a, k)) — cbE{z{a, k)) > 0 for all a and k, with equality holding 
when k = kt{a) = 9~^kg{a) + (1 — |)a. The first-order condition in k for this 
non-negative function to attain its minimum gives 

fc,(a) - a - g)a j ^ (v\k,(a)) + ch'^(k,(a)))m. 



Taylor expanding kg{a — Aa) = k — kg{a)Aa + o(Aa) using kg{a) from ([52]), 
we find a recursive relation for v'{k) near k\ 

'V' {k - + C&E]fc=fc-^Aa+o(Aa)- 

Neglecting the o(Aa) terms and setting fe^/(x) := v'{k—x)/c+{l — ^)~^b'^ — 
{1 — j^)~^b'^{k)x, the recursion simplifies to f{ff) = N9f{x) which is solved 
by constant multiples of f{x) = Thus, to leading order 


v'{k — Ak) 


_log N9 

const\Ak\ 


cbk 


1 

Ne 


+ 


cbm 

1_A 

^ N^e 


Ak. 


Either the first or the second summand dominates this expression as Ak —)■ 0, 
depending on the sign of N9 — 1. One might worry that const depends on the 
sequence along which the recursion is solved, but for N9 > 1 the monotonicity 
of v' precludes this, to yield the desired identity fHHj) . ■ 

Some remarks concerning hypotheses (i)-(iv): Proposition [7| ensures (i) 
holds if N'9' and N9 are large enough, while Theorem [TSl ensures (ii) holds 
when c > 0, and can be selected otherwise. We do not know conditions which 
guarantee (iii)-(iv), since differentiability may fail for kt{a) on a set of zero 
measure, and for v{k) at a countable number of points. We can however, 
ensure that kt is bi-Lipschitz by combining the lower bound on its derivative 
from Lemma HH with the upper bound provided by Proposition [3 in case 
N9 > 1. This makes failure of (hi) seem unlikely, since the value of k^{a) 
would have to oscillate between these positive bounds, producing a reciprocal 
oscillation in K{k) near k. Similarly, the alternative to (iv) is that jump 
discontinuities in the monotone function v'{k) accumulate at k. At least one 
of the three types of singular behavior must occur, and fHHj) seems the most 
likely, especially given its consistency with the divergence fl2^ predicted by 
Proposition [T] To be absolutely correct, however, one should say Theorem [TB] 
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provides strong evidence in favor of a phase transition with wage gradients 
diverging if and only if NO > 1, where the leading order behavior of (H5]) 
changes. The theorem also provides concrete quantitative predictions which 
can be investigated numerically. 

A Optimal plans and absence of a duality gap 

This appendix establishes the existence of measures achieving the maximum 
LP*{6) in the original flTT]) and (5-perturbed dual problem fl2^ . and verihes 
the absence LP*{6) = LP^{6) of a duality gap. While such claims are natural 
analogs to duality results well-known in hnite-dimensional linear program¬ 
ming, in our inhnite-dimensional context they will remain true only if we are 
careful to choose the correct functional analytic setting. These choices are 
made clear in the proofs of the following statements. 

Lemma 17 (Existence of optimal measures) Fix6,cs non-negative and 
6,6', N,N' positive with max{6', 6^'} < 1 < N and N > 1. Let a be a Borel 
probability measure on A, where A = [0,a[= K with 0 < a = fc G spta, and 
define z{a, k) = {1 — 6)a + 6k, be = bE^ z and b'gfia, k) = & l ((1 — 6')a -|- 6'k), 
where bE/i G C^{K). Then there exist feasible measures > 0 and > 0 
on maximizing the dual problem fl2T|) . 

Proof. As we now describe, existence of a maximizing e and A fol¬ 
lows from a standard compactness and continuity argument. The continu¬ 
ous functions C{A^) on the compact square A^ form a Banach space when 
equipped with the supremum norm || ■ ||oo- Borel probability measures form 
a weak-* compact subset of the dual Banach space, according to the Riesz- 
Markov and Banach-Alaoglu theorems. A sequence e* —?• eoo converges in the 
weak-* topology if and only if the integral efif) of each continuous function 
/ G C{A^) against converges to the integral of / against Coo- Feasibility of 
A, e > 0 asserts 

U^)a+ [ f{a)a{da) = f f{a)e{da,dk) and 
JA JAxK 

[ [f{k') + ^f{k)]X{dk',dk) = (/5)k + / [f{z{a,k))-j^f{k)]e{da,dk) 

Jk^ Jaxk 

for each / G C{A). Thus the feasible pairs form a weak-* compact subset of 
C{A^)*. Since bg, b'g, G C(A^), the linear functional we are trying to maximize 


41 


is weak-* continuous, hence its maximum must be attained, provided the set 
of feasible measures (e. A) is non-empty. To see the feasible set is non-empty, 
let e concentrate on the diagonal: e = {id x id)#{a + Then the 

marginals of e coincide with k := z^e = a + i^^H^\Ai since z{a, a) = a. 

Choosing A := {id x id)^{-^^H^\A) dehnes a feasible pair. ■ 

The next theorem addresses the absence of a duality gap. It is proved 
using generalization of the Fenchel-Rockafellar duality theorem found in Bor- 
wein and Zhu [3] (and pointed out to us by Yann Brenier). As in the preced¬ 
ing lemma, the Fenchel-Rockafellar theorem will involve the duality between 
measures and continuous, bounded functions. On the other hand, LP^{6) is 
necessarily dehned by an inhmum over a larger class of functions Fs includ¬ 
ing some unbounded ones. Thus the Fenchel-Rockafellar theorem by itself 
yields only an inequality LP^{5) < LP*{6) and not the desired equality. 
Fortunately, the complementary inequality is established in Proposition | 8 l 

Theorem 18 (No duality gap) Fix 6, cs non-negative and 6, 6', N, N' and 
d = k positive with max{0, 0'} < 1 < N. Let A = [0, d[= K and a be a 
Borel probability measure on A satisfying the doubling condition flT^ at d, 
and define z{a, k) = {l — 6)a+6k, bg = bE°z and bg,{a, k) = hL{{l — 6')a+6'k) 
where bE/i ^ C{K). Then the values LP*{6) = LPfid) of the infimum 
and supremum fl2T|) coincide. 

Proof. Let H ; Z —)■ Z* be a bounded linear transformation between 
a Banach space Z and its dual Z*, on which convex functions ip : Z —> 
R U {+oo} and : Z* —;■ R U {+oo} are dehned. Let Domyj := {z G Z | 
ip{z) < oo}. Dehne the Legendre transform of 0 by 

(l)*{z) := sup {z.,z*) — (f){z*) (53) 

z*&Z* 

on z G Z and analogously (p* on Z*. Here (z, z*} denotes the duality pairing. 
If 0 is continuous and real-valued at some point in ff{Dom(p), then pp. 
135-137 of [3] asserts 

inf ip{z) + (j){Hz) = max -ip*{H*z*) - (j)*{-z*). 

z^Z z*^Z* 

In our case 

ips{u,v) = S{u v)a + / u{a)a{da) 

J[0,a] 
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so 




0 if (/i, = (a+ 


+ CX0 


1 ^ 1 ' 
else, 


\K\- 


wliile 


so 


, / - f 0 if M > c^ho and v > b'r 

'^(“■”) = \+oo else; 


,*/ ,. _ / cse{bg) + \{bg,) if e < 0 and A < 0 
^ \ +00 else; 

and H : C{A) © C{K) C{A x K) © C{K x K) is given by 


so that 


H 


H* 


u\ _ f u{a) + ■^v{k) — v{z{a, k)) 
v{k') + 


^A^ + ]^A^ + - z^ej 

Notice if is continuous, while taking u, v large and constant makes cj) o H 
hnite. With these dehnitions fl53l) therefore asserts: 


LPAb) < inf 995(M,n) + 0(i/(M,n)) 

ueC(A) 

vec(K) 

= max -(^|(i7*(e,A))-0*(-e,-A) 

e>0 on AxK 
A>0 on KxK 


= LP*{5). 


Here we have an inequality rather than the desired equality because the deh- 
nition of LP^,{6) involves minimizing over a broader class of feasible functions 
which need neither be continuous nor bounded. For such functions how¬ 
ever, Proposition |8] asserts the opposite inequality, to conclude the proof of 
the theorem. ■ 
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